Feature Vector Quality and Distributional Similarity

نویسندگان

  • Maayan Zhitomirsky-Geffet
  • Ido Dagan
چکیده

We suggest a new goal and evaluation criterion for word similarity measures. The new criterion meaning-entailing substitutability fits the needs of semantic-oriented NLP applications and can be evaluated directly (independent of an application) at a good level of human agreement. Motivated by this semantic criterion we analyze the empirical quality of distributional word feature vectors and its impact on word similarity results, proposing an objective measure for evaluating feature vector quality. Finally, a novel feature weighting and selection function is presented, which yields superior feature vectors and better word similarity performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Articles: Bootstrapping Distributional Feature Vector Quality

This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distribution...

متن کامل

Learning Thesaurus Relations from Distributional Features

In distributional semantics words are represented by aggregated context features. The similarity of words can be computed by comparing their feature vectors. Thus, we can predict whether two words are synonymous or similar with respect to some other semantic relation. We will show on six different datasets of pairs of similar and non-similar words that a supervised learning algorithm on feature...

متن کامل

Exploring the effect of semantic similarity for Phrase-based Machine Translation

The paper investigates the use of semantic similarity scores as feature in the phrase based machine translation system. We propose the use of partial least square regression to learn the bilingual word embedding using compositional distributional semantics. The model outperforms the baseline system which is shown by an increase in BLEU score. We also show the effect of varying the vector dimens...

متن کامل

On the effect of word frequency on distributional similarity

The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, ...

متن کامل

Monolingual Distributional Similarity for Text-to-Text Generation

Previous work on paraphrase extraction and application has relied on either parallel datasets, or on distributional similarity metrics over large text corpora. Our approach combines these two orthogonal sources of information and directly integrates them into our paraphrasing system’s log-linear model. We compare different distributional similarity feature-sets and show significant improvements...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004